iPAS Exam Preparation Notes - AI Application Planner
TLDR
- AI Fundamentals: AI, ML, and DL share a nested relationship; current mainstream commercial AI belongs to "Narrow AI."
- Data Engineering: Data Lakehouse combines the flexibility of a data lake with the governance capabilities of a data warehouse; the Medallion Architecture (Bronze/Silver/Gold) is the standard model for tiered data management.
- Data Processing: ELT is gradually replacing ETL to preserve raw data details for AI training.
- Data Governance: Data Mesh addresses the scaling bottlenecks of centralized platforms through domain-oriented ownership.
- Feature Engineering: Categorical feature encoding selection depends on cardinality and model type (One-Hot, Target, WoE, etc.); numerical features require normalization (Z-score, Robust Scaling).
- Model Evaluation: For class imbalance issues, prioritize metric selection (F1, AUC, MCC) and decision threshold adjustment rather than relying solely on Accuracy.
- Deep Learning: The Transformer architecture is the cornerstone of modern NLP; CNNs excel at spatial image features; Diffusion Models are the mainstream for image generation.
- AI Governance: The EU AI Act adopts risk-based management; AI systems must ensure fairness, explainability, and security, utilizing Model Cards and Datasheets for transparency.
AI Fundamental Concepts
AI Capability Levels and Classification
Artificial Intelligence refers to technologies that enable machines to simulate human intelligent behavior. Current commercial AI (e.g., ChatGPT, AlphaGo) belongs to "Narrow AI," characterized by:
- No Autonomous Goal Setting: Can only respond to prompts or external tasks.
- No Persistent Memory: Does not autonomously accumulate experience after a conversation ends.
- Limited Cross-Domain Transfer: Performance relies on massive training data and post-training processes.
AI functions can be categorized into: Analytical, Predictive, Generative, and Prescriptive (recommending the best course of action).
AI, Machine Learning, and Deep Learning
The three share a nested relationship:
- AI: Any technology that allows machines to exhibit intelligent behavior.
- ML: Learning patterns automatically through data without explicit rule programming.
- DL: Using multi-layered neural networks to automatically extract features.
Data Engineering
Data Storage Architecture
- Data Warehouse: Structured data, Schema-on-Write, suitable for reporting.
- Data Lake: Raw data, Schema-on-Read, suitable for exploration.
- Data Lakehouse: Combines both, supports ACID transactions and version tracking, suitable for reporting, ML, and RAG.
Medallion Architecture
- Bronze: Raw data, maintained in its original form.
- Silver: Cleaned and standardized, common across business units.
- Gold: Business consumption layer, pre-calculated datasets.
Data Governance
- Data Mesh: Decentralizes data ownership to business domains, managed through self-service infrastructure and federated governance.
- Data Catalog/Metadata/Lineage: Solves the problems of "findability," "understandability," and "traceability" of data, respectively.
Feature Engineering
Categorical Feature Encoding Selection
- One-Hot: Suitable for features with few categories and no inherent order (tree models).
- Ordinal: Suitable for features with a clear order (e.g., education level).
- Target Encoding: Suitable for high-cardinality features, but requires precautions against Data Leakage.
- WoE: Standard practice for binary classification in the financial sector.
- Feature Hashing: Suitable for streaming data or memory-constrained scenarios.
Data Quality and Imbalance Handling
- Six Dimensions of Data Quality: Accuracy, Completeness, Consistency, Timeliness, Uniqueness, Validity.
- Imbalance Handling:
- SMOTE: Suitable for numerical features, generates synthetic samples by interpolating between minority class samples.
- Decision Threshold Adjustment: Adjusted post-training, most cost-effective.
- Anomaly Detection: When class ratios are extreme (e.g., 99.99:0.01), use Isolation Forest or One-Class SVM.
Machine Learning Algorithms
Supervised Learning
- Linear Models: Logistic Regression outputs probabilities, suitable for binary classification.
- Decision Trees: Make predictions via split rules; high explainability, but single trees are prone to overfitting.
- SVM: Finds decision boundaries via Maximum Margin, suitable for high-dimensional, small-sample data.
- Ensemble Learning:
- Bagging (Random Forest): Reduces Variance.
- Boosting (XGBoost, LightGBM, CatBoost): Reduces Bias, improves predictive power.
Unsupervised Learning
- K-Means: Spherical clustering, requires pre-specifying the K value.
- DBSCAN: Density-based clustering, automatically identifies noise points, no need to specify the number of clusters.
Deep Learning and Model Architecture
- CNN: Convolutional layers extract local features, suitable for image processing.
- RNN/LSTM: Processes sequential data; LSTM uses gating mechanisms to solve the vanishing gradient problem.
- Transformer: Based on the Self-Attention mechanism, supports parallel computing, and is the foundation of modern LLMs.
- Diffusion Model: Generates high-quality images through a reverse denoising process.
AI Governance and Security
AI Governance Framework
- EU AI Act: A risk-based management framework that prohibits unacceptable risks and strictly regulates high-risk AI.
- NIST AI RMF: Provides a process language for risk management (Govern, Map, Measure, Manage).
- ISO/IEC 42001: International standard for AI management systems, emphasizing accountability and continuous improvement.
Security Protection
- Prompt Injection: Defense focuses on isolating instructions from data.
- Privacy Protection: Uses Differential Privacy to inject noise or Federated Learning to ensure raw data never leaves the local environment.
- Explainability (XAI): SHAP and LIME are the mainstream tools for post-hoc explanation of black-box models.
Change Log: 2026-05-20 Initial document created.